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Visiting Gafni’s Reduction Land: 
from the BG Simulation to the Extended BG Simulation 

Damien Imbs , Michel Raynal 
damien. imbs @ irisa.fr, raynal @ irisa.fr 

Abstract: The Borowsky-Gafni (BG) simulation algorithm is a powerful tool that allows a set of t + 1 asynchronous sequential 
processes to wait-free simulate (i.e., despite the crash of up to t of them) a large number n of processes under the assumption that at 
most t of these processes fail (i.e., the simulated algorithm is assumed to be /-resilient). The BG simulation has been used to prove 
solvability and unsolvability results for crash-prone asynchronous shared memory systems. 

In its initial form, the BG simulation applies only to colorless decision tasks, i.e., tasks in which nothing prevents processes to decide 
the same value (e.g., consensus or fc-set agreement tasks). Said in another way, it does not apply to decision problems such as renaming 
where no two processes are allowed to decide the same new name. Very recently (STOC 2009), Eli Gafni has presented an extended 
BG simulation algorithm (GeBG) that generalizes the basic BG algorithm by extending it to “colored” decision tasks such as renaming. 
His algorithm is based on a sequence of sub-protocols where a sub-protocol is either the base agreement protocol that is at the core of 
BG simulation, or a commit-adopt protocol. 

This paper presents the core of an extended BG simulation algorithm that is particularly simple. This algorithm is based on two 
underlying objects: the base agreement object used in the BG simulation (as does GeBG), and (differently from GeBG) a new simple 
object that we call arbiter. As in GeBG, while each of the n simulated processes is simulated by each simulator, each of the first t + 1 
simulated processes is associated with a predetermined simulator that we called its “owner”. The arbiter object is used to ensure that the 
permanent blocking (crash) of any of these t + 1 simulated processes can only be due to the crash of its owner simulator. After being 
presented in a modular way, the proposed extended BG simulation algorithm is proved correct. 

Key-words: Arbiter, Asynchronous processes. Distributed computability, Fault-Tolerance, Process crash failure. Shared memory 
system. Wait-free environment. Reduction, f-Resilience. 


De la BG Simulation a la BG Simulation etendue 
Resume : Dans ce rapport, nous decrivons comment passer de la BG simulation a la BG simulation etendue. 

Mots cles : Arbitre, Processus asynchrones, Calculabilite distribute, Tolerance aux fautes, Faute par crash, Reduction, t-Resilience, 
Systeme a memoire partagee, Environnement sans attente. 
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1 Introduction 

What is the Boroswky-Gafni (BG) simulation Considering an asynchronous system where processes can crash, the (n. k)- set agree¬ 
ment problem is a basic decision task defined as follows [9], Each of the n processes proposes a value, and every process that does 
not crash has to decide a value (termination), such that a decided value is a proposed value (validity) and at most k different values are 
decided (agreement). The consensus problem corresponds to the particular case k = 1. 

A fundamental question related to distributed computability is the following. Suppose we have an algorithm that solves the (15,4)- 
set agreement problem. Can we use this algorithm as a subroutine to solve the (12, 5)-set agreement problem, assuming that at most 
/ < 12 processes can crash? Intuitively, the answer might be “yes” (as we have less processes and more decided values are allowed). 
Let us now suppose that we want to use the same (15,4)-set agreement subroutine to solve the (100,4)-set agreement problem. As we 
have much more proposed values, and the same constraint on the number of decided values, an intuitive answer does not spring in an 
obvious way. And what is the answer if we want to solve the (80, 7)-set agreement problem (much more proposed values but only two 
more values can be decided), or (assuming t = 4) solve the (5,4)-set agreement problem? 

Stated in more general terms, the question is: “Can we use a solution to the (n, k)-set agreement problem as a subroutine to solve 
the (n k')-set agreement problem, when at most t < min(n, n') processes may crash?' (We say that “the (n', k')- set agreement is 
reducible to (n, A:)-set agreement”.) The BG simulation (introduced in [6] and formalized and deeply investigated in [7] where is given a 
formal definition of “reducibility”) answers this fundamental question. It states that the answer if “yes” if k' > k and “no” if k' < t < k. 
As we can see, the answer “yes” does not depend on the number of processes. 

To that end, a BG simulation algorithm is described that allows n! = t + 1 processes to simulate a large number n of processes 
that collectively solve a decision task in presence of at most t crashes. Each of the n' simulator processes simulates all the n processes. 
These n' simulator processes cooperate through underlying objects (the type of which is called here safe_agreement) that allow them to 
agree on a single output for each of the non-deterministic statements issued by every simulated processes. 

The important lesson learned from the BG simulation is that, in a failure-prone context, what is important is not the number of 
processes but the maximal number of possible failures and the actual number of values that are proposed to a decision task. An 
interesting application of the BG simulation (among several of its applications [7]) is the proof that there is no /-resilient (n, k)- set 
agreement algorithm for t > k. This is obtained as follows. As (1) the BG simulation allows reducing the (k + 1, fc)-set agreement 
problem to the (n, fc)-set agreement problem in a system with up to k failures, and (2) the (k + 1, fc)-set agreement problem is known to 
be impossible in presence of k failures [6, 13, 16], it follows that there is no fc-resilient («, fc)-set agreement algorithm. 

The limit of the BG-simulation and the extended BG-simulation The BG simulation characterizes /-resilient solvability by reducing 
it to the question of wait-free solvability (i.e., /-resilience in a system of n = t + 1 processes). Unfortunately, the BG simulation is 
limited to colorless decision tasks, i.e., tasks in which if a process decides a value v, then all the processes can decide that value (the 
class of colorless tasks is formally defined in [7]). The (n, k)- set decision problem is typically such a task. From an operational point of 
view, this is due to the fact that, in the BG simulation, each simulator simulates fairly all the processes, and consequently, the crash of a 
simulator process can manifest itself as the crash of any simulated process (the one it is currently simulating a critical part of code). 

The extended BG simulation has been proposed by Eli Gafni to overcome this limitation and consequently fully capture /-resilience 
[12], As stated in [12] “With the extended BG simulation we can reduce questions about /-resilience solvability to questions about 
wait-free solvability. The latter is characterized by the Herlihy-Shavit conditions [13]”. 

As a result, it applies to both colorless tasks and colored decision tasks such as the renaming problem [3]. In that problem, each 
of the n processes has to decide a new name (from a given new name space) such that no two processes have the same new name. 
This problem has wait-free solutions when the new name space [1..M] is such that M > 2n — 1 (see [8] for a deeper insight into the 
problem). 

In his paper [12], Gafni presents several (un)decidability results that can be obtained in a simpler way from the BG simulation. He 
also uses the extended BG simulation to show that the /-resilient weak symmetry breaking problem is equivalent to /-resilient weak 
renaming problem. 

The core of the BG simulation relies on the following principles: (1) each of the (/ + 1) simulators fairly simulates all the processes, 
and this simulation is such that (2) the crash of a simulator entails the crash of at most one simulated process. The BG simulation is 
“symmetric” in the sense that each of the n processes is simulated by every simulator, and the (/ +1) simulators are “equal” with respect 
to each simulated process. One way to be able to simulate colored tasks (without preventing the simulation of colorless tasks), consists 
in introducing some form of asymmetry [12], 

The extended BG simulation realizes the appropriate asymmetry as follows. As in th BG simulation each simulator process q 
simulates all the processes, but it is associated with a given simulated process p (in our terminology, q is the owner of p). Then 
ownership notion is used to to ensure that the corresponding simulated process p will not be blocked forever (perceived as crashed) if its 
owner simulator q does not crash. Hence, if a simulator does not crash, it can always decide the value decided by the simulated process 
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p it “owns”. As noticed and demonstrated in [12] “extending the BG simulation by this simple property results in a full characterization 
of (-resilience in terms of wait-freedom”. 

Content of the paper In addition to the introduction of the notion of extended BG simulation, and a full characterization of t- 
resilience, Gafni presents in [12] an extended BG simulation algorithm (denoted here GeBG). This algorithm is based on a sequence of 
sub-protocols where each sub-protocol is either the base agreement protocol used in the BG simulation (safe_agreement type objects) 
or a commit-adopt protocol [11]. This algorithm is presented informally in English. 

The present paper presents the core of an extended BG simulation algorithm that is particularly simple. This algorithm is based on 
two underlying object types: the type safe_agreement (the one used in the BG simulation algorithm and in GeBG), and (differently from 
GeBG) an object type that we call arbiter. An arbiter object allows exploiting the ownership notion in a simple way to ensure that (1) an 
object value is always decided when its owner does not crash, and (2) the value of that object is determined either by its owner simulator 
or by the other simulators. 

As far as the whole simulation is concerned, while (as in the BG simulation) each of the n simulated processes is simulated by 
each simulator, (as in GeBG) each of the first t + 1 simulated processes is “associated” with exactly one simulator (its “owner”). As 
already said, it follows from the appropriate use of the arbiter objects that the permanent blocking (crash) of any of these t +1 simulated 
processes can only be due to the crash of its owner simulator. 

The paper is made up of 7 sections. Section 2 presents the model and the definition of decision tasks. Section 3 explains the structure 
of the simulation. Section 4 defines the base object types used by the simulators to cooperate and realize a correct simulation. Then, the 
extended BG simulation algorithm is presented in an incremental and modular way. First Section 5 briefly presents the BG simulation 
algorithm, and then Section 6 enriches it to solve the extended BG simulation. This algorithm is proved in Section 7. 

2 Solving decision tasks 

2.1 Decision tasks 

The problems we are interested in are called decision tasks 1 . In every run, each process proposes a value and the proposed values define 
an input vector I where I[j] is the value proposed by pj. Let X denote the set of allowed input vectors. Each process has to decide a 
value. The decided values define an output vector O, such that 0\j\ is the value decided by pj. Let O be the set of the output vectors. 

A decision task is a binary relation A from X into O. A task is colorless if, when a value v is decided by a process p :l (i.e., 0[j] = v), 
then v can be decided by all the processes). Consensus, and more generally fc-set agreement, are colorless tasks. Otherwise the task is 
colored. Renaming is a colored task. 

2.2 The computation model 

Asynchronous processes and fault model We are interested in distributed algorithms the aim of which is to solve a task in a system 
made up of n asynchronous sequential processes denoted p-\, ..., p n . A process executes a sequence of atomic steps (as defined by its 
algorithm). Each process p 3 is endowed with a write-once local variable output j where it deposits the value it decides. 

A process can crash in a run. A process executes correctly the steps defined by its algorithm until it crashes (if ever it does). After if 
has crashed, a process executes no more steps. If it does not crash, a process executes an infinite number of steps. 

It is assumed that an arbitrary subset (not known in advance) of up to t < n processes can crash (the crash of one process being 
independent from the crash of other processes). A process that does not crash in a run is said to be correct in that run, otherwise it is 
faulty. This failure model is called the t-resilient environment, and an algorithm designed for such an environment is said to be t-resilient. 
The extreme case t = n — 1 is called wait-free environment, and the corresponding algorithms are called wait-free algorithms. 

Communication model The n processes cooperate through a shared memory made up of a snapshot object [1] denoted mem. This 
means that a process p :) can write only the entry mem[j ] but can read all the entries by invoking the operation ?ne?n.snapshot(). The 
write and snapshot operation appears as being executed atomically [1], (These operations can be built on top of a single-writer/multiple- 
readers atomic registers [1, 4]). Initially, mem\j] = _L. 

Definition The previous computation model (asynchronous crash-prone processes that communicate through snapshot objects) is 
called snapshot model. 

’The reader interested in a more formal presentation of decision tasks can consult the literature (e.g., [2, 7, 12, 13]). 
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2.3 Algorithm solving a task 

An algorithm solves a task in a /-resilient environment if, given any I £ X, each correct process pj decides (i.e., writes a value v in 
output j) and there is an output vector O such that (/, O) £ A where O is defined as follows. If p :j decides v, then 0[j] = v. If pj does 
not decide, 0\j] is set to any value v' that preserves the relation (J, O ) £ A. 

A task is solvable in a /-resilient environment if there is an algorithm that solves it in that environment. As an example, consensus is 
not solvable in the 1-resilient environment [10, 14,15]. Differently, renaming with 2n —1 names is solvable in the wait-free environment 
[5, 3, 13]. 

3 Simulated processes vs simulator processes 

Aim Let A be an n-process /-resilient algorithm that solves a decision task in the base snapshot model described previously. The aim 
is to design a (/ + l)-process wait-free algorithm A' that simulates A in the same snapshot model. (The reader is referred to [7] for a 
formal definition of a simulation.) 

Notation A simulated process is denoted pj with 1 < j < n, and the subscript j is always used to refer to a simulated process. 
Similarly, a simulator process (in short “simulator”) is denoted q, with 1 < i < t + 1, and the subscript i is always used to refer to a 
simulator. 

As far the objects accessed by the simulators are concerned, the following convention is adopted. The objects denoted with upper 
case letters are the objects shared by the simulators. Differently an object denoted with lower case letters is local to a simulator (in that 
case, the associated subscript denotes the corresponding simulator). 

What does a simulator Each simulator qi is given the code of all the simulated processes p\. .... p n . It manages n threads, each one 
associated with a simulated process, and locally executes these threads in a fair way. It also manages a local copy merrii of the snapshot 
memory mem shared by the simulated processes. 

The code of a simulated process pj contains writes of mem[j] and invocations of mem.snapshot(). These are the only operations 
used by the processes pi,... ,p n to cooperate. So, the core of the simulation is the definition of two algorithms. The first (denoted 
sim_writej j()) has to describe what a simulator < 7 ,; has to do in order to correctly simulate a write of mem[j] issued by a process Pj. 
The second (denoted sim_sanpshot, •()) has to describe what a simulator < 7 , has to do in order to correctly simulate an invocation of 
me?n.sanpshot() issued by a process pj. 

4 Base object types used in the simulation 

In addition to snapshot objects, the simulator processes cooperate through atomic read/write register objects, and specific objects the 
types of which (safe_agreement and arbiter) are defined in this section. These types can be implemented from multi-reader/multi-writer 
atomic registers, which in turn can be implemented from snapshot objects. Hence, all the base objects used in the simulation can be 
implemented in the snapshot computation model described in the previous section. 

4.1 The safe.agreement object type 

The safe agreement type This object type (defined in [ 6 , 7]) is at the core of the BG simulation. It provides each simulator < 7 ,; with 
two operations, denoted propose, (n) and decide^), that ^ can invoke at most once, and in that order. The operation propose, (u) allows 
qi to propose a value v while decide, () allows it to decide a value. The properties satisfied by an object of the type safe.agreement, 
owned by qj, are the following. 

• Termination. If no simulator q x crashes while executing propose^), then any correct simulator q, that invokes decide^), returns 
from that invocation. 

• Agreement. At most one value is decided. 

• Validity. A decided value is a proposed value. 
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init: for each x : l<x<t + ldo SM[ x] <— (_L, 0) end for. 

operation propose; (u): %1<i<t+l% 

(01) SM[i\ *- (u,l); 

(02) snii <— SM.snapshot(); 

(03) if (3a: : level = 2) then SM[i\ <— {v, 0) else SM[i\ <— (v, 2) end if. 

operation decide; (): %l<i<t + l% 

(04) repeat sm; <— SM.snapshotQ until (Vx : smi[x].level ^ 1) end repeat; 
(05) let x = min({fc | sm;[i;].lere! = 2); res <— smi[x].value; 

(06) return(res). 


Figure 1: An implementation of the safe_agreement type [7] (code for g;) 


An implementation The implementation of the safe_agreement type described in Figure 1 is from [7], This construction is based on 
a snapshot object SM (with one entry per simulator g,). Each entry SM[i] of the snapshot object has two fields: SM[i].value that 
contains a value and SM[i].level that stores its level. The level 0 means the corresponding value is meaningless, 1 means it is unstable, 
while 2 means it is stable. 

When a simulator g; invokes propose,^), it first writes the pair (v, 1) in SM[i] (line 01), and then reads the snapshot object SM 
(line 02). If there is a stable value in SM, p* “cancels” the value it proposes, otherwise it makes it stable (line 03). 

A simulator q t invokes decide^) after it has invoked propose^). Its aim is to return the same stable value to all the simulators that 
invoke this operation (line 06). To that end, q, repeatedly computes a snapshot of SM until it sees no unstable value in SM (line 04). 
Let us observe that, as a simulator g; invokes decide^) after it has invoked propose;(v), there is at least one stable value in SM when it 
executes line 05. Finally, in order that the same stable value be returned to all, g, returns the stable value proposed by the simulator with 
the smallest id (line 05). 

A formal proof that this algorithm implements the safe_agreement type is given [7]. For completeness purpose, a proof is also given 
in Appendix A. 

4.2 The arbiter object type 

Definition Similarly to the objects of type safe_agreement, each object of the type arbiter has a statically predefined owner simulator 
q :) . Such an object provides the simulators with a single operation denoted arbitrate* j() (where i is the id of the invoking simulator and 
j the id of the owner). A simulator g; invokes arbitrate; j() at most once, and, when it terminates, this invocation returns a value to g.;. 
The properties of an object of the type arbiter owned by qj are the following. 

• Termination. If the owner qj invokes arbitrate^ () and is correct, or does not invoke arbitrate^- (), or if a simulator q, terminates 
its invocation arbitrate; jQ, then all the correct simulators returns from their arbitrate; j() invocation. 

• Agreement. No two processes return different values. 

• Validity. The returned value is 1 {owner) or 0 ( notjowner ). Moreover, if the owner does not invoke arbitrate^.,- (), 1 cannot be 
returned, and if only the owner invokes arbitrate; j(), 0 cannot be returned. 

An implementation An implementation of an object of the type arbiter is described in Figure 2. It is based on a snapshot object 
PART (initialized to [false ,..., false]), and an atomic register WINNER (initialized to _L). 

When it invokes arbitrate; j(), the simulator g; announces that it participates (line 01), and issues a snapshot to know the simulators 
that are currently participating (line 02). If g, is the owner of the object (i = j , line 03), it checks if it is the first participant (predicate 
parti = {*}). If it is, it sets WINNER to 1, otherwise it sets it to 0 (line 04). If p, is not the owner of the object (i ^ j), it checks if the 
owner is a participating simulator (predicate j £ parti). If is its, g; waits to know which value has been assigned to WINNER. If it is 
not, it sets WINNER to 0. Finally, g; terminates by returning the value of WINNER. 


operation arbitrate; j{): % 1 < i,j < t + 1 % 

(01) PART[i\ <— true ; 

(02) auxi <— PART .snapshot^)', parti (x | auxi[x]}; 

(03) if (i = j) % pi is the owner of the associated arbiter type object % 

(04) then if {parti = (*}) th en WINNER <— 1 else WINNER <— 0 end if 

(05) else if (j S partA then wait ( WINNER A -L) else WINNER <— 0 end if 

(06) end if; 

(07) return) WINNER). 


Figure 2: The arbitrate; j() operation of the arbiter object type (code for q.f) 
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A proof that this construction implements the arbiter object type is given in Appendix B. 


5 The BG simulation 

This section presents the BG simulation [6, 7]: its main principles and the algorithms implementing its base operations sim_write,j 0 
and sim.snapshot^ •(). 

5.1 The shared memory MEM[ 1.. (* + 1)] 

The snapshot memory mem shared by the processes pi,... ,p n is emulated by a snapshot object MEM shared by the simulators (so, 
MEM has (t + 1) entries). 

More specifically, MEM[i\ is an atomic register that contains an array with one entry per simulated process pj. Each MEM[i\[j] 
is made up of two fields: MEM[i][j],value that contains the last value of mem[j] written by pj, and MEM[i][j].sn that contains the 
associated sequence number. (This sequence number, introduced by the simulation, is a control data that will be used to produce a 
consistent simulation of the m.em.snapshotQ operations issued by the simulated processes pj). 

5.2 The sim_writejj() operation 

The algorithm, denoted sim-write.^Qt;), executed by qi to simulate the write by pj of the value v into mem[j\ is described in Figure 3 
[7]. Its code is pretty simple. The simulator qi first increases a local sequence number wsrii[j ] that will be associated with the value v 
written by p :j into mem[j]. Then, qi writes the pair (v, wsrii[j ]) into rnern l 0] (where mem.i is its local copy of the memory shared by 
the simulated processes) and finally writes atomically its local copy memi into MEM[i\. 


operation sim.writer j(v): 

(01) W-sn,i[j] ^ 

- wsni[j] + 1; 

(02) mem, [j] <- 

- (v,w.srii [(]); 

(03) MEM [i] 

- memi. 


Figure 3: write^ (v) executed by qi to simulate write(w) issued by pj (from [7]) 


5.3 The si m _sn a pshot i • () operation 

This operation is implemented by the algorithm described in Figure 4 [7]. 

Additional local and shared objects For each process pj, a simulator q, manages a local sequence number generator snap sn t \j\ 
used to associates a sequence number with each mem. snapshotQ it simulates on behalf of p :j (line 04). 

In addition to the snapshot object MEM[l..(t + 1)], the simulators qi,..., qt+i cooperate through an array SAFE_AG[l..n, 0...] 
of safe_agreement type objects. 

Underlying principle of the BG simulation [6, 7]: obtaining a consistent value In order to agree on the very same output of the 
snapsn- th invocation of mem.snapshotQ that is issued by pj, the simulators qi,..., q t+i use the object SAFE_AG[j, snapsn ]. 

Each simulator q., proposes a value (denoted inputi ) to that object (line 05) and, due to its agreement property, that object will 
deliver them the same output at line 06. In order to ensure the consistent progress of the simulation, the input value inputi proposed by 
the simulator qi to SAFE AG[j, snapsn] is defined as follows. 

• First, qi issues a snapshot of MEM in order to obtain a consistent view of the simulation state. The value of this snapshot is kept 
in sm.i (line 01). 

Fet us observe that smj[x][j/] is such that (1) smi[x][y\.sn is the number of writes issues by p y into rnern[y} that have been 
simulated up to now by q x , and (2) smt [x] [y\ .value is the value of the last write into mem[y] as simulated by q x on behalf of p y . 

• Then, for each p y , qi computes inputs[y]. To that end, it extracts from sm,;[l..i + 1] [y] the value written by the more advanced 
simulator q s as far as the simulation of p y is concerned. This is expressed in lines 02-03. 

Once, inputi has been computed, q t proposes it to SAFE AG\j, snapsn] (line 05), and then returns the value decided by that object 
(lines 06-07). 

The previous description shows an important feature of the BG simulation. A value inputi [y] = sm.i [s] [y] .value proposed by a 
simulator y,; can be such that s?n,[s][y].sn > sm,[t] [y\.sn, i.e., the simulator q s is more advanced than qi as far as the simulation of 
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operation sim-snapshot^ j (): 

(01) srrii <— MEM. snapshot(): 

(02) for each y : 1 < y < n: do inputi[y] = smi[s][y].t;aZtte 

(03) where \/x : 1 < x < t + 1 : smj[s][y].sn > smj[a:][y].sn end for; 

(04) snap_sni[7] snap_srii[ji] + 1; let snapsn = snap_srii[j]‘, 

(05) enter.mutex; SAFE.AG[j, snapsn]. propose^ (input*); exit.mutex; 

(06) res <— SAFE_AG[j, snapsn]. decide*() 

(07) return (res). 


Figure 4: sim_snapshot, •() executed by < 7 ,; to simulate me?n.snapshotQ issued by pg (from [7]) 


p y is concerned. This causes no problem, as when < 7 ,; will simulate mem.snapshotQ operations for p y (if any) that are between the 
( srrii [*] [y].sn)-th and the ( srrii [s] [y] -sn)-th write operations of p y , it will obtain a value that has already been computed and is currently 
kept in the corresponding SAFE _AG\y, —] object. 

Underlying principle of the BG simulation [ 6 , 7]: from wait-freedom to /-resilience Each simulator < 7 , simulates the n pro¬ 
cesses p\... “in parallel” and in a fair way. But any g, can crash. The crash of q, while it is engaged in the simulation of 
mem. snapshot^) on behalf of several processes pj, pgr, etc., can entail their definitive blocking, i.e., their crash. This is because 
each SAFE_AG[j, — ] object guarantees that its SAFE_AG[j, — J.decideQ invocations do terminate only if no simulator crashes while 
executing SAFE AG[j, — J.proposeQ (line 05 of Figure 4). 

The simple (and bright) idea of the BG simulation to solve this problem consists in allowing a simulator to be engaged in only one 
SAFE-AG [—, — ].propose() invocation at a time. Hence, if qi crashes while executing SAFE_AG[j, — J.proposeQ, it can entails the 
crash of pj only. This is obtained by using an additional mutual exclusion object offering the operations enter_mutex and exit_mutex. 
(Let us notice that such a mutex object is purely local to each simulator: it solves conflicts among the simulating threads inside each 
simulator, and has nothing to do with the memory shared by the simulators). 

From /-resilience to wait-freedom As an example let us consider we have a (-resilient agorithm that solves the (n, () agreement 
problem. We obtain a wait-free algorithm that solves the (( + 1 , t) agreement problem as follows. Each simulator q, (1 < i < 
t + 1) is initially given a proposed value ig, and the base objects SAFE_AG[l..n, 0] are used by the (( + 1) simulators as follows 
to determine the value proposed by pg. For each j, 1 < j < n , the simulator invokes first SAFE_AG[j , Oj.proposeQuQ and then 
SAFE_AG[j, 0].decide^) that returns it a value that it considers as the value proposed by pj. It is easy to see that, for any j , all the 
simulators obtain the same value for pg. Moreover, this value is one of the f + 1 values proposed by the simulators. Finally, simulator 
process q, can decide any of the values decided by the processes pj it is simulating. (It is easy to see that the BG simulation is for 
colorless decision tasks.) A formal proof of this reduction (based on input/output automata) can be found in [7]. 

From wait-freedom to (-resilience For colorless decision tasks, (-resilience can easily be reduced to wait-freedom as follows. First, 
each application process deposits its input value in a shared register. Then, every process of the (+1 processes of the wait-free algorithm 
takes one of those values as its input value and executes its code. Finally, each application process decides any value decided by a process 
of the wait-free algorithm. 

6 The extended BG simulation 

This section extends the previous algorithms in order to solve the extended BG simulation. Our aim is to obtain an implementation hat 
is “as simple as possible”. To that end, we proceed incrementally by “only” enriching the previous base BG simulation. The proposed 
implementation uses the same snapshot object MEM and the same sim_write, j() operation (Figure 3) as the base BG simulation. It 
also uses the same SAFE-AG[\..n, 0...] array made up of safe_agreement type objects. 

This section presents the additional shared objects that are used, the underlying principles on which relies the implementation of 
mem. snaspshot() issued by a simulator q t on behalf of a simulated process pj. and the algorithm (denoted e_sim_snapshotj •()) that 
implements it. 

6.1 The additional shared objects 

In addition to MEM and SAFE-AG[l..n, 0...], the memory shared by the simulators q \,..., q t +i contains the following objects. 

• ARBITER[l..t + 1, 0...] is an array of arbiter objects. The objects ARBITER[j , —} are owned by the simulator qg (1 <3 < 
t + 1). 
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The object ARBITER[j, snapsn] is used by a simulator when it simulates its snapsn -th invocation mem. snapshot!]) on 
behalf of the simulated process pj for 1 < j < t + 1 . (As we will see, when t + 1 < j < n, the simulation of mem. snapshot!]) 
on behalf of pj does not require the help of an arbiter object.) 

• ARB _VAL[l..t + 1, 0...] [0.. 1] is an array of pairs of atomic registers. The pair of atomic registers ARB _VAL[j, snapsn] [0..1] is 
used in conjunction with the arbiter object ARBITER[j, snapsn\. 

The aim of ARB-VAL[j, snapsn] [1] is to contain the value that has to be returned to the snapsn -th invocation mem. snapshot(), 
on behalf of the simulated process p :j , if the owner qj is designated as the winner by the associated ARBITER[j, snapsn] object. 
If the owner qj is not the winner, the value that has to be returned is the one kept in ARB _VAL[[j, s?rapsn] [0]. 

6.2 The e_sim_snapshot- ■() operation 

The enriched algorithm The code of the algorithm that implements the operation e sim snapshot, •() executed by q-i to simulate a 
mem. snapshot() operation issued by pj is described in Figure 5. Its first four lines and its last line are exactly the same as in Figure 4. 
The lines 05-06 are replaced by the new lines NO 1-Nil that constitutes the “addition” that allows going from the BG to the extended 
BG simulation. 

Underlying principle Albeit each simulated process pj (1 < j < n) is simulated by each simulator q, (l < i < t + 1) as in the BG 
simulation, each simulated process pj such that 1 < j < t + 1 is associated with exactly one simulator that is its “owner”: g, is the 
owner of pj if j = i (and also the owner of the corresponding objects ARBITER[j, —]). The aim is, for any snapsn > 0, to associate 
a single returned value with the snapsn -th invocations of e_sim_snapshotj •() issued by the simulators. The idea is to use the ownership 
notion to “shortcut” the use of SAFE_AG[j, snapsn] object in appropriate circumstances. 

The operation e_sim_snapshot i •() for the simulated processes pg such that t + 2 < j <n, is exactly the same as sim_snapshot, ; ; (). 
This appears in the lines N01-N02 that are the same as the lines 06-07 of Figure 4 (in that case, there is no ownership notion and 
consequently no simulator q t ever invokes SAFE_AG[j, snapsn]. abortj()). 

The new lines N03-N10 address the case of the simulated processes owned by a simulator, i.e., the processes p -\.... ,Pt+i- The idea 
is the following: if qi does not crash, p, has not to crash. In that way, if qi is correct, pi will always terminate whatever the behavior 
of the other simulators. To that end, qi on one side, and all the other simulators on the other side, compete to define the snapshot 
value returned by the snapsn -th invocations e_sim_snapshot J ■ ■() issued by each of them. To attain this goal, the additional objects 
ARBITER[j, snapsn] and ARB_VAL[j, snapsn] are used in the following way. 

All the simulators invoke ARBITER[j, snapsn]. arbitrate; j () (at line N04 if qi is the owner, and line N09 if it is not). According 
to the specification of the arbiter type, these invocations do not return different values, and do return at least when the owner q :1 is correct 
and invokes that operation (as indicated in the specification, there are other cases where the invocations do terminate). Finally, the value 
returned indicates if the winner is the owner ( 1 ) or not ( 0 ). 

If the winner is the owner q :i , the value returned by the snapsn -th invocations of e_sim_snapshotj j() (one invocation by simulator) 
is the value input j computed by the owner. That value is kept in the atomic register ARB _VAL[j, snapsn] [1] (line N03). 

If the owner is not the winner, the value returned is the value determined by the other (non-owner) simulators that have invoked 
SAFE_AG[j, snapsn].propose.^inputi) (line N07) and SAFE_AG[j, snapsn], decide^) (line N09). The value they have computed 
has been deposited in ARB -VAL[j, snapsn] [0] (line N08), and is used as the result of the SAFE-AG[j, snapsn] object. 

It is important to notice that the owner qj does not invoke propose^) and decidej () on the objects it owns. Moreover, the simulator 
qj is the only that can write ARB_VAL[j, s?tapsn][l], while the other simulators can write only ARB_VAL[j, snapsn][0]. 

To summarize, if a simulator qi crashes, it entails the crash of at most one simulated process. This is ensured thanks to the mutex 
algorithm. If the simulator qi crashes, 1 < i < t + 1, as far the simulated processes are concerned, it can entail either no crash at all (if 
qi crashes outside a critical section), or the crash of pi (if it crashes while executing arbitrate,;., () inside the critical section at line N04), 
or the crash of a process Pj such that 1 < j / i < ( + 1 (this can occur only if qj has crashed and was not winner, and qi crashes inside 
the critical section at line N08), or the crash of one of the processes pt+ 2 , •••, p n (if it crashes at line N01 inside the critical section). 

(-Resilience vs wait-freedom Given a BG simulation algorithm where a simulated process pj (1 < j < t + 1 < n) can be blocked 
forever only if the simulator qj crashes, Gafni shows in [12] that wait-freedom and (-resilience are equivalent for decision tasks (this 
paper shows also strong results on equivalence between weak renaming and weak symmetry breaking). 


7 Proof of the extended BG simulation 

Lemma 1 A simulator can block the progression of only one simulated process at a time. 
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operation e_sim_snapshot^ j (): 

(01) srrii <— MEM. snapshot(): 

(02) for each y : 1 < y < n: do inputi[y] = smi[s,y\.value 

(03) where \/x : 1 < x < t + 1 : smi[s, y].sn > smj[x, y\.sn end for; 

(04) snap_sni[j] <— snap.sni[j] + 1; let snapsn = snap_sni[j]; 

(N01) if (j > t + 1) then enter.mutex; SAFE.AG[j , snapsn].pr op ose^ (input i); exit.mutex; 
(N02) res <— SAFE_AG\j, snapsn]. decide^() 

(N03) elseif (i = j) then ARB.VAL[j, snapsn][ 1] inputs 

(N04) enter_mutex; win ARBITER[j, snapsn]. arbitrate^* (); exit_mutex; 

(N05) if (win = 1) then res <— inputs 

(N06) else res <— ARB_VAL[j, snapsn] [0] end if; 

(N07) else enter.mutex; SAFE _AG[j, snapsn]. pro pose (mp-ut^); exit.mutex; 

(N08) ARB-VAL\j, snapsn] [0] SAFE_AG[j, snapsn]. decide^); 

(N09) r <— ARBITER[j, snapsn]. arbitrate^j (); 

(N10) res <— ARB. VAL[j, snapsn] [r] 

(Nil) end if; 

(07) return (res). 


Figure 5: The operation e_sim_snapshot i •() executed by qi to simulate mem.snapshot() issued by pj 


Proof A simulator can block the simulation of a process only during an e_sim_snapshot() operation, when the simulator uses a 
safe_agreement (lines N01-N02 or N07-N08) or an arbiter object because it is its owner (line N04). All these invocations are placed in 
mutual exclusion. Thus a simulator can block the simulation of only a single process at a time. □ Lemma 1 


Lemma 2 The simulated process pi is never blocked at the simulator q^. 

Proof The e_sim_snapshot() operation, when invoked by simulator qi for the simulated process p, (line N03, i = j ) does not include 
any wait statement and does not use a safe_agreement object. Due to the properties of the arbiter object type, it cannot be blocked 
during its invocation of arbitrateQ. Thus, the simulated process pi can never be blocked at simulator qi. □ Lemma i 


Lemma 3 Each simulator receives the decision value of at least n — t simulated processes. 

Proof Because at most t simulators may crash, and a simulator can block at most a single simulated process at a time (Lemma 1), each 
simulator can execute the code of at least n — t simulated processes without being blocked forever. Because the simulated algorithm is 
f-resilient, these n — t processes will then decide a value. □ Lemma 3 


Lemma 4 All simulators that return a value for the k-th snapshot of the simulated process pj return the same value. 

Proof If the simulated process q t isn’t owned by any simulator (j > t + 1), then because of the properties of the safe_agreement 
objects, the same snapshot is always returned (lines N01-N02 of Figure 5). 

If the owner of the simulated process pj chooses the value it has computed for pf s fc-th snapshot, it has written this value in 
ARB_VAL[j, snapsn][ 1] (line N03), and is the winner of the arbiter object (line N05). All other simulators will then read its value 
(line N10). 

If the simulated process has an owner but another process chooses the value it has computed for pf s fc-th snapshot, this process has 
already agreed on a value with all other non owner processes (safe_agreement object, lines N07-N08) and is the winner of the arbiter 
object (lines N09-N10). All non-owner processes will then write the same value in ARB_VAL[j, snapsn ] [0] (line N08) and the owner 
will read it (line N06). 

Thus, all simulators that return a value for the fc-th snapshot of the simulated process pj return the same value. □ Lemma 4 


Lemma 5 At most one decision value can be decided by a simulated process on any simulator. 

Proof Because every simulator computes the same value for any given snapshot and because the snapshot operations are the only 
non-deterministic parts of codes of the simulated processes, all simulators that decide a value for a given simulated process decide the 
same value. □ Lemma 5 

Lemma 6 The sequences of all writes and snapshots for each simulated process correspond to a correct execution of the simulated 
algorithm. 
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Proof Every simulator that is not blocked while simulating a process simulates it in the same way (same values written and same 
snapshots read. Lemma 4). 

When simulator q t executes e_sim_snapshot() for p :) (i.e. the simulation of a snapshot for pj), it stores in its input i variable the 
values written by the simulators that have advanced the most for each simulated process (Figure 3 and lines 01-03 of Figure 5). It can 
choose its own input. t snapshot value only if no other simulator has already ended the execution of this e_sim_snapshot() (Lemma 4 
implies that safe_agreement objects have a “memory” effect). Thus, for each e_sim_snapshot(), qi returns an input value computed by 
itself or another simulator. Let us notice that, when this input value has been determined, no simulator had terminated its associated 
e_sim_snapshot(). (If this was not the case, that simulator would have provided the other simulators with its own input value.) Because 
processes are simulated deterministically, the input value returned contains the last value written by pj as seen by q t . This shows that 
the simulated process order is respected. 

To ensure that the simulation is correct, we then have to show that the writes and snapshots of all processes can be linearized. The 
linearization point of the writes is placed at line 03 of Figure 3 of the first simulator that executes it. The linearization point of the 
snapshots is placed at line 01 of Figure 5 of the simulator q , that imposes its inputs value. 

Because the simulator that imposes its inputs value in a e_sim_snapshot() operation reads the most advanced values at the time 
of its snapshot (lines 02-03 of Figure 5), and because once a simulator finishes the execution of e_sim_snapshot(), the value for this 
e sim snapshot)) cannot change (Lemma 4), the linearization correspond to a linearization of a correct execution of the simulated 
algorithm. □ Lem m a 6 


Theorem 1 The extended BG simulation algorithms described in Figures 3 and 5 are correct. 

Proof Lemmas 2, 3, 5 and 6 show that the extended BG simulation algorithms presented here are correct. LI Theorem, t 
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A Proof of the safe agreement object type 

Lemma 7 A single value is returned by invocations o/decide(). 

Proof Because snapshot objects are linearisable, we can associate a date with each snapshot operation. Let t be the date of the first 
snapshot operation that returns an array sm,j containing an entry k such that sm 3 [k],level = 2. This snapshot operation can be invoked 
either in a propose() or decide() operation. 

Any process p :) that takes a snapshot during its propose() operation after date t won’t set its level at 2 (line 04). Thus, the set of 
processes {pf\SM[k\ .level = 2} won’t change after date t, and no process will get out of the loop at lines 05-06 before taking a snaphot 
after t. So, no two processes return different values. □Lemma 7 

Lemma 8 If no process crashes while it executes the proposeQ operation, all the correct processes terminate their invocation of 
decide(). 

Proof For any given process p t , before its invocation of propose() and after it, S M\i],level f 1. The loop at line 04 is repeated only if 
3j : SM[j].level = 1. Thus, if no process crashes during an invocation of propose(), all the correct processes terminate their invocation 
ofdecideQ. □ Lemma 8 


Lemma 9 The decided value is a proposed value. 

Proof If an invocation of decide,)) terminates and returns a value, it returns a value that it has obtained through a snapshot at line 04, 
and such a value has been written at line 01. It follows that it is a proposed value. □ Lemma 9 


Theorem 2 The algorithm in Figure 1 respects the specifications given in Section 4.1. 

Proof Lemmas 7, 8 and 9 show that the algorithm in Figure 1 respects the specifications given in Section 4.1. □Theorem i 


B Proof of the arbiter object type 

Lemma 10 If the owner participates and is correct, then all correct participating processes terminate. 

Proof There is no loop and the only blocking statement of the algorithm is the wait statement at line 05 where a process p, waits for 
a value to be assigned to WINNER. The owner (pj) assigns a value to WINNER before it terminates (line 04). So, if the owner 
participates and is correct, all correct participating processes terminate. □ Lemma to 


Lemma 11 If the owner does not participate, then all correct participating processes terminate. 

Proof Again, there is no loop and the only blocking statement of the algorithm is the wait statement at line 05. This statement 
is executed only if the process observes that the owner has started participating. So, if the owner does not participate, all correct 
participating processes terminate. □ Lemma n 


Lemma 12 If a process terminates, then all correct participating processes terminate. 

Proof Again, there is no loop and the only blocking statement of the algorithm is the wait statement at line 05 (it waits for a value to be 
assigned to WINNER). If a process does not execute this wait statement, it assigns a value to WINNER. So, if a process terminates, 
all correct participating processes terminate. □ Lemma 12 


Lemma 13 All processes return the same value. 

Proof The only process that can assign a value different from 0 to WINNER is the owner (line 04). So, we only have to consider this 
case. If the owner assigns the value 1 to WINNER, it means that in the snapshot it has taken at line 02, it did not observe any other 
process (line 04). Because snapshots are linearizable and the owner announced that it started before taking the snapshot (line 01), all 
the other processes will see that the owner has started and will execute the wait statement (line 05) instead of assigning 0 to WINNER. 
Thus, all processes return the same value. □ Lemma 13 
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Lemma 14 If the owner does not participate, the value returned is 0. 

Proof The only process that can assign a value different from 0 to WINNER is the owner (line 04). So, if the owner does not participate, 
the value of WINNER is 0, and all processes return this value. □ Lemma 14 

Lemma 15 If the owner participates alone, the value returned is 1. 

Proof If the owner does not observe any other participating process, it assigns the value 1 to WINNER. Thus, if the owner participates 
alone, it returns 1. 0 Lemma 15 

Theorem 3 The algorithm in Figure 2 respects the specifications given in Section 4.2. 

Proof Lemmas 10, 11, 12, 13, 14 and 15 show that the algorithm in Figure 2 respects the specifications given in Section 4.2. 0 Theorem 3 
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